Outlier-robust estimation of state-space models using a penalized approach


Rajan Shankar¹, Garth Tarr¹, Ines Wilms², Jakob Raymaekers³

¹University of Sydney, ²Maastricht University, ³University of Antwerp

Motivation

State-space model

\[\begin{align} \mathbf x_t &= \Phi \mathbf x_{t-1} + \mathbf w_t \\ \mathbf y_t &= A \mathbf x_t + \mathbf v_t \\ \end{align}\]
  • \(\mathbf x_t\): state vector
  • \(\Phi\): state transition matrix
  • \(\mathbf w_t \sim N(\mathbf 0, \Sigma_\mathbf{w})\)
  • \(\mathbf y_t\): observation vector
  • \(A\): observation / measurement matrix
  • \(\mathbf v_t \sim N(\mathbf 0, \Sigma_\mathbf{v})\)

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

Example: Object tracking

\[\begin{align} \mathbf x_t &= \Phi \mathbf x_{t-1} + \mathbf w_t \\ \mathbf y_t &= A \mathbf x_t + \mathbf v_t \\ \end{align}\]
  • \(\mathbf x_t = \left[\begin{matrix}\text{pos}_{1,t} \\ \text{pos}_{2,t} \\ \text{vel}_{1,t} \\ \text{vel}_{2,t}\end{matrix}\right],\ \ \ \ \mathbf y_t = \left[\begin{matrix}\text{pos}_{1,t} \\ \text{pos}_{2,t}\end{matrix}\right]\)

Example: Object tracking

\[\begin{align} \mathbf x_t &= \Phi \mathbf x_{t-1} + \mathbf w_t \\ \mathbf y_t &= A \mathbf x_t + \mathbf v_t \\ \end{align}\]
  • \(\Phi = \left[\begin{matrix}1&0&0.1&0 \\ 0&1&0&0.1 \\ 0&0&1&0 \\ 0&0&0&1\end{matrix}\right]\)
  • \(A = \left[\begin{matrix}1&0&0&0 \\ 0&1&0&0 \end{matrix}\right]\)
  • \(\Sigma_\mathbf w = \theta_1 I_{4\times 4}\)
  • \(\Sigma_\mathbf v = \theta_2 I_{2\times 2}\)
  • \(\color{#e64626}{\theta_1 = 0.1}\)
  • \(\color{#e64626}{\theta_2 = 1}\)

Aim: to estimate \(\boldsymbol\theta = (\color{#e64626}{\theta_1},\color{#e64626}{\theta_2})\)

Parameter estimation

Given a parameter vector \(\boldsymbol \theta\):

  • \(\mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) is the prediction for observation \(t\) given data up to time \(t-1\)
  • \(\mathbf r_t(\boldsymbol\theta) := \mathbf y_{t} - \mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) is the residual
  • \(S_{t|t-1}(\boldsymbol\theta)\) is the prediction variance

\(\mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) and \(S_{t|t-1}(\boldsymbol\theta)\) are computed using the Kalman filter

\[ \min_{\boldsymbol\theta}\sum_{t=1}^n \left\{\log \left|S_{t|t-1}(\boldsymbol\theta)\right| + \mathbf r_t(\boldsymbol\theta)^\top S_{t|t-1}^{-1}(\boldsymbol\theta) \mathbf r_t(\boldsymbol\theta)\right\} \]

The estimate for \(\boldsymbol \theta\) can be found using standard optimisation routines.

Outliers

What if the observation process is corrupted with outliers?

Outlier generation

Outlier generation

Outlier generation

Outlier generation

Outlier generation

Outlier generation

Robust estimation

Minimise over \(\boldsymbol\theta, \boldsymbol\gamma_{t}\) (\(t=1,\dots,n\)):

\[ \sum_{t=1}^n \left\{\color{#e64626}{1_{\{\boldsymbol\gamma_t = \mathbf0\}}}\log \left|S_{t|t-1}(\boldsymbol\theta)\right| + \left(\mathbf r_t(\boldsymbol\theta) - \color{#e64626}{\boldsymbol\gamma_t}\right)^\top S_{t|t-1}^{-1}(\boldsymbol\theta) \left(\mathbf r_t(\boldsymbol\theta) - \color{#e64626}{\boldsymbol\gamma_t}\right)\right\} + \color{#e64626}{\sum_{t=1}^n P(\boldsymbol{\gamma}_t; \lambda)} \]

  • \(P\) is the hard penalty:

\[ P(\boldsymbol{\gamma}_t; \lambda) =\begin{cases}\lambda^2 & \text{if } \boldsymbol{\gamma}_t \neq 0 \\ 0 & \text{otherwise} \end{cases} \]

  • \(\lambda\) is a tuning parameter

Robust estimation

Alternating estimation algorithm. At iteration \(k\):

  1. Estimate \(\boldsymbol\theta^{(k)}\) given all \(\boldsymbol\gamma^{(k)}_{t}\) by minimising:

\[ \sum_{t=1}^n \left\{1_{\{\boldsymbol\gamma^{(k)}_t = 0\}}\log \left|S_{t|t-1}(\boldsymbol\theta)\right| + \left(\mathbf r_t(\boldsymbol\theta) - \boldsymbol\gamma^{(k)}_t\right)^\top S_{t|t-1}^{-1}(\boldsymbol\theta) \left(\mathbf r_t(\boldsymbol\theta) - \boldsymbol\gamma^{(k)}_t\right)\right\} \]

  1. Update each \(\boldsymbol \gamma^{(k+1)}_{t}\) given \(\boldsymbol\theta^{(k)}\) via hard-thresholding:

\[ \boldsymbol \gamma_t^{(k+1)} = \begin{cases} \mathbf r_t(\boldsymbol\theta^{(k)}) & \text{if } \sqrt{\mathbf r_t(\boldsymbol\theta^{(k)})^\top S_{t|t-1}^{-1}(\boldsymbol\theta^{(k)}) \mathbf r_t(\boldsymbol\theta^{(k)})} > \lambda \\ \mathbf 0 & \text{otherwise}\end{cases} \]

  1. Repeat until convergence.

Choosing \(\lambda\)

Assessing fit

Assessing fit

Assessing fit

More examples

  • Outliers generated at 3 different distances (10, 20, 50 units)

More examples

  • Outliers generated at 3 different distances (10, 20, 50 units)

More examples

  • Outliers generated at 3 different distances (10, 20, 50 units)

More examples

  • Outliers generated at 3 different distances (10, 20, 50 units)

More examples

  • Outliers generated in a cluster

More examples

  • Outliers generated in a cluster

More examples

  • Outliers generated in a cluster

More examples

  • Outliers generated in a cluster

Summary

  • Estimate parameters of a state-space model in the presence of observational outliers
  • Currently running simulation experiments to investigate statistical properties
    • Object tracking
    • Autoregressive processes
    • Time-varying regression

Contact